The dataset includes:
- Year: 1995-2024
- Sports_Facilities: Total number of sports facilities
- People_Doing_Sports_K: Number of people doing sports (in thousands)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("idman_dataset.csv")
df.head()
df.isnull().sum()
df.columns = ['Year', 'Unnamed', 'Sports_Facilities', 'People_Doing_Sports_K']
df = df.drop('Unnamed', axis=1)
print(df.isnull().sum())
df = df.sort_values('Year')
df
Year 0 Sports_Facilities 0 People_Doing_Sports_K 0 dtype: int64
| Year | Sports_Facilities | People_Doing_Sports_K | |
|---|---|---|---|
| 29 | 1995 | 6670.0 | 394.1 |
| 28 | 1996 | 6937.0 | 298.7 |
| 27 | 1997 | 6885.0 | 353.4 |
| 26 | 1998 | 7238.0 | 329.6 |
| 25 | 1999 | 7864.0 | 352.6 |
| 24 | 2000 | 7908.0 | 355.2 |
| 23 | 2001 | 8129.0 | 400.5 |
| 22 | 2002 | 8214.0 | 410.3 |
| 21 | 2003 | 8940.0 | 539.6 |
| 20 | 2004 | 8948.0 | 546.5 |
| 19 | 2005 | 8732.0 | 529.8 |
| 18 | 2006 | 9000.0 | 538.4 |
| 17 | 2007 | 9323.0 | 539.3 |
| 16 | 2008 | 9604.0 | 542.6 |
| 15 | 2009 | 9623.0 | 1617.4 |
| 14 | 2010 | 9491.0 | 1649.8 |
| 13 | 2011 | 9954.0 | 1660.4 |
| 12 | 2012 | 10259.0 | 1678.4 |
| 11 | 2013 | 10574.0 | 1685.1 |
| 10 | 2014 | 10798.0 | 1723.8 |
| 9 | 2015 | 11027.0 | 1724.7 |
| 8 | 2016 | 11215.0 | 1755.4 |
| 7 | 2017 | 11412.0 | 1785.9 |
| 6 | 2018 | 11545.0 | 1827.9 |
| 5 | 2019 | 11674.0 | 1864.8 |
| 4 | 2020 | 11770.0 | 1861.6 |
| 3 | 2021 | 11915.0 | 1897.6 |
| 2 | 2022 | 12156.0 | 1918.8 |
| 1 | 2023 | 12270.0 | 1921.6 |
| 0 | 2024 | 12290.0 | 1898.8 |
I have dropped the unnamed column because it does have nan elements and i dont think it is necessary
plt.figure(figsize=(10, 6))
plt.plot(df['Year'], df['Sports_Facilities'])
plt.title('Number of Sports Facilities Over Time')
plt.xlabel('Year')
plt.ylabel('Number of Facilities')
plt.grid(True)
plt.show()
The number of sports facilities has grown from 6,670 in 1995 to 12,290 in 2024, representing an 84% increase over 30 years. Growth has been consistent with no major drops.
fig, ax1 = plt.subplots(figsize=(12, 6))
ax1.plot(df['Year'], df['Sports_Facilities'], 'b-o')
ax1.set_xlabel('Year')
ax1.set_ylabel('Sports Facilities', color='blue')
ax2 = ax1.twinx()
ax2.plot(df['Year'], df['People_Doing_Sports_K'], 'r-s')
ax2.set_ylabel('People(thousands)', color='red')
plt.title('Sports Facilities vs Participation Over Time')
plt.show()
# Make your charts prettier with seaborn
sns.set_style("whitegrid")
# Recreate your charts with seaborn styling
import plotly.graph_objects as go
import plotly.express as px
# Create interactive dual-axis chart
fig = go.Figure()
fig.add_trace(go.Scatter(
x=df['Year'],
y=df['Sports_Facilities'],
name='Sports Facilities',
mode='lines+markers'
))
fig.add_trace(go.Scatter(
x=df['Year'],
y=df['People_Doing_Sports_K'],
name='People Doing Sports (K)',
mode='lines+markers',
yaxis='y2'
))
fig.update_layout(
title='Interactive: Sports Facilities vs Participation',
yaxis=dict(title='Sports Facilities'),
yaxis2=dict(title='People (thousands)', overlaying='y', side='right')
)
fig.show()
There is an anomaly in 2005-2010 which is number of people doing sports are skyrocketed in that period
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Sports_Facilities', y='People_Doing_Sports_K', data=df, s=100)
plt.title('Correlation: Facilities vs Participation')
correlation = df['Sports_Facilities'].corr(df['People_Doing_Sports_K'])
plt.text(0.05, 0.95, f'Correlation: {correlation:.3f}',
transform=plt.gca().transAxes, fontsize=12)
plt.show()
Correlation Scatter Plot
plt.figure(figsize=(6, 4))
sns.heatmap(df[['Sports_Facilities', 'People_Doing_Sports_K']].corr(),
annot=True, cmap='coolwarm', center=0)
plt.title('Correlation Heatmap')
plt.show()
df['Facilities_Growth'] = df['Sports_Facilities'].pct_change() * 100
df['Participation_Growth'] = df['People_Doing_Sports_K'].pct_change() * 100
plt.figure(figsize=(12, 5))
plt.plot(df['Year'], df['Facilities_Growth'], label='Facilities Growth %', marker='o')
plt.plot(df['Year'], df['Participation_Growth'], label='Participation Growth %', marker='s')
plt.title('Year-over-Year Growth Rates')
plt.xlabel('Year')
plt.ylabel('Growth Rate (%)')
plt.legend()
plt.axhline(y=0, color='black', linestyle='--', alpha=0.3)
plt.grid(True)
plt.show()
df['Facilities_per_1K'] = df['Sports_Facilities'] / df['People_Doing_Sports_K']
fig = px.line(df, x='Year', y='Facilities_per_1K',
title='Sports Facilities per 1000 Participants',
markers=True)
fig.update_layout(yaxis_title='Facilities per 1000 People')
fig.show()
1. Steady Infrastructure Growth¶
Azerbaijan showed consistent commitment with 84% growth in sports facilities from 6,670 (1995) to 12,290 (2024), averaging 2.5% annual growth.
2. The 2008 Data Anomaly¶
A dramatic 198% jump in participation occurred between 2008-2009 (539K → 1,617K), while facilities grew only 2.9%. This indicates a methodology change in how participants were counted, making pre- and post-2008 data incomparable.
Conclusion¶
Key Findings:
- ✅ 30 years of consistent infrastructure investment
- ⚠️ 2008 methodology change complicates trend analysis
- 📈 Strong correlation between facilities and participation